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We provide a general method to analyze the asymptotic proper¬ 
ties of a variety of estimators of continuous time diffusion processes 
when the data are not only discretely sampled in time but the time 
separating successive observations may possibly be random. We in¬ 
troduce a new operator, the generalized infinitesimal generator, to 
obtain Taylor expansions of the asymptotic moments of the estima¬ 
tors. As a special case, our results apply to the situation where the 
data are discretely sampled at a fixed nonrandom time interval. We 
include as specific examples estimators based on maximum-likelihood 
and discrete approximations such as the Euler scheme. 


1. Introduction. Most theoretical models in finance are spelled out in 
continuous time [see, e.g., Merton (1992)], whereas the observed data are, 
by nature, discretely sampled in time. Estimating these models from dis¬ 
crete time observations has become in recent years an active area of re¬ 
search in statistics and econometrics, and a number of estimation procedures 
have been proposed in the context of parametric models for continuous-time 
Markov processes, often in the special case of diffusions. Not only are the 
observations sampled discretely in time, but it is often the case with hnan- 
cial data that the time separating successive observations is itself random, 
as illustrated, for example, in Figure 1 of Ait-Sahalia and Mykland [(2003), 
page 484], 

This earlier paper focused on the case of inference with the help of like¬ 
lihood. For data of the type we consider, however, it is common to use a 
variety of estimating equations, of which likelihood is only one instance; see, 
for example, Hansen and Scheinkman (1995), Ait-Sahalia (1996, 2002) and 
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Bibby, Jacobsen and S 0 rensen (2004). Our objective in this paper is to carry 
out a detailed analysis of the asymptotic properties of a large class of such 
estimators in the context of discretely and randomly sampled data. Unlike 
Ait-Sahalia and Mykland (2003), it will also permit the diffusion function 
to depend on both the parameter and the data. 

We model this situation as follows. Suppose that we observe the process 

( 1 ) dXt = fi{Xt-,9)dt + a{Xt;-/)dWt 

at discrete times in the interval [0,r], and we wish to estimate the parame¬ 
ters 6 and/or 7 . We call the observation times tq = 0, ti,T 2 , ■ ■ ■ where 
Nt is the smallest integer such that r^Vy+i > T. Because the properties of 
estimators vary widely depending upon whether the drift or the diffusion pa¬ 
rameters, or both, are estimated, we consider the three cases of estimating 
(3 = (0, 7 ) jointly, jS = 0 with 7 known or /3 = 7 with 0 known. In regular cir¬ 
cumstances, [3 converges in probability to some (3 and VT{(3 — f3) converges 
in law to A^(0, U^) as T tends to infinity. 

For each estimator, the corresponding and, when applicable the bias 
[3 — /Jo, depend on the transition density of the diffusion process, which is 
generally unknown in closed form. Our solution is to derive Taylor expan¬ 
sions for the asymptotic variance and bias starting with a leading term that 
corresponds to the limiting case where the sampling is continuous in time. 
Our main results deliver closed form expressions for the terms of these Tay¬ 
lor expansions. For that purpose, we introduce a new operator, which we 
call the generalized infinitesimal generator of the diffusion. 

Specifically, we write the law of the sampling intervals as 

(2) A = eAo, 

where Aq has a given finite distribution and e is deterministic. Our Taylor 
expansions take the form 

(3) = + 0{e^), 

(4) = + + 0{e^). 

While the limiting term as e goes to zero corresponds to continuous sam¬ 
pling, by adding higher-order terms in e, we progressively correct this lead¬ 
ing term for the discreteness of the sampling. The two equations (3) and 
(4) can then be used to analyze the relative merits of different estimation 
approaches, by comparing the order in e at which various effects manifest 
themselves, and when they are equal, the relative magnitudes of the corre¬ 
sponding coefficients in the expansion. 

Because the coefficients of the expansions depend upon the distribution of 
the sampling intervals, we can also use these expressions to assess the effect 
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of different sampling patterns on the overall properties of estimators. More¬ 
over, our results apply not only to random sampling, but also to the situation 
where the sampling interval is time-varying in a deterministic manner (see 
Section 5.3), or to the case where the sampling interval is simply fixed, in 
which case we just need to set Var[Ao] = 0 in all our expressions. One partic¬ 
ular example is indeed sampling at a deterministic fixed time interval, such 
as, say, daily or weekly, which is the setup adopted by much of the recent lit¬ 
erature on discretely observed diffusions [see, e.g., Hansen and Scheinkman 
(1995), Ait-Sahalia (1996, 2002) and Bibby, Jacobsen and Sprensen (2004)]. 

The paper is organized as follows. Section 2 sets up the model and the 
assumptions used throughout the paper. Section 3 develops a general theory 
that establishes the asymptotic properties of a large class of estimators of 
parametric diffusions and their Taylor expansions. Section 4 applies these 
results to two specific examples of estimating equations: first, the maximum 
likelihood estimator; second, the Euler approximate discrete scheme based 
on a Gaussian likelihood. Our conclusions also carry over to the maximum 
likelihood-type estimators discussed in Ait-Sahalia and Mykland (2003). We 
discuss extensions of the theory in Section 5. Proofs are contained in Section 
6 . Section 7 concludes. 

2. Data structure and inference scheme. 


2.1. The process and the sampling. We let S = {x,x) denote the domain 
of the diffusion Xt. In general, S = (—oo,-t-oo), but in many examples in 
finance, we are led to consider variables such as nominal interest rates, in 
which case S = (0,-|-oo). Whenever we are estimating parameters, we will 
take the parameter space for the d-dimensional vector f5 to be an open and 
bounded set. We will make use of the scale and speed densities of the process, 
defined as 

(5) s(x;/3) = exp|-2y (/r(y;(9)/cJ^(y; 7 ))dyj, 

( 6 ) m(x;/3) = l/(cr^(x; 7 )s(x;/ 3 )) 


and the scale and speed measures S{x]P) = s{w; (3) dw and M{x-,(3) = 
m{w] (3) dw. The lower bound of integration is an arbitrary point in the in¬ 
terior of S. We also define the same increasing transformation as in Ait-Sahalia 
( 2002 ): 


( 7 ) 



du 

aix;^)' 


We assume below conditions that make this transformation well defined. By 
Ito’s lemma, Xt = g{Xt;'y) defined on 5 = {g{x','y),g{x]'y)) satishes dXt = 
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jl{Xt; 13 ) dt + dWt with 

^ cj(5'“'"(x;7);7) 2 dx ’ 

where gf™'' denotes the reciprocal transformation. We also define the scale 
and speed densities oi X, s and fh, and A(x; /?) = — / 3 )^ + 9 /i(x; / 3 )/ 5 x)/ 2 . 

We make the following primitive assumptions on {fJ,,cr): 

Assumption 1 . For all values of the parameters (0,7) we have the fol¬ 
lowing: 

1 . Differentiability: The functions fi{x-, 9 ) and a{x-,'y) are infinitely differen¬ 
tiable in X. 

2 . Nondegeneracy of the diffusion: If 5 = (—oo, -|-oo), there exists a constant 

c such that a(x; 7) > c > 0 for all x and 7. If 5 = ( 0 , -|-oo), lim3,^o+ 7) = 
0 is possible but then there exist constants ^o>0,w>0,/?>0 such that 
iT^(x; 7) > for all 0 < x < ^0 and 7. Whether or not lim2,^o+ 7) = 

0 , (T is nondegenerate in the interior of S, that is for each ^ > 0 , there 
exists a constant such that it^(x; 7) > > 0 for all x e [^, -t-00) and 7. 

3 . Boundary behavior: fj., and their derivatives have at most polynomial 
growth in x near the boundaries, lima,^^ 5 (x; P) = —00 and limaj^^j 5 (x; P) = 
+ 00 , 

(8) lim inf a{x\ / 3 ) > 0 and lini sup fl(x; P) <0 

x—*x X^X 

and 

( 9 ) lim _ sup A(x;/ 3 ) <-1-00. 

x^x or x^x 

4. Identification: ^(x; 6 ) = //(x; 9 ) for vr-almost all x in 5 implies 9 = 9 and 
cj^(x;7) = cj^(x;7) for 7 r-almost all x in 5 implies 7 = 7. 


Under Assumption 1, the stochastic differential equation (1) admits a 
weak solution which is unique in probability law. This follows from the 
Engelbert-Schmidt criterion [see, e.g.. Theorem 5.5.15 in Karatzas and Shreve 
(1991) replacing M by 5 throughout], with explosions ruled out by the 
boundary behavior of the process. The divergence of the scale measure makes 
the boundaries unattainable, since it implies that 




s{v] P) dv 


m{u; P) du = 00 , 


S 


X — 



s{v; P) dv 


m{u\ P) du = 00 . 




SAMPLING OF DIFFUSIONS 


5 


Given that it is unattainable (i.e., given that = oo), the boundary x is 
natural when Nx = oo and entrance when Nx < oo, where 


Nx^ 


m{v; j3) dv (3) du, 


and similarly for the boundary x [see, e.g.. Section 15.6 in Karlin and Taylor 
(1981)]. If both boundaries are entrance, then the integrability assumption 
on the speed measure m will automatically be satisfied. When one of the 
boundaries is natural, integrability of m is neither implied nor precluded. 

Condition (8), however, guarantees that the process X will be stationary: 
with b denoting the limsup in (8), we have near the right boundary 

P) = exp|2 J fl{y, 6) dyj < cexp{26x}, 

where c is a constant, and similarly near the left boundary. Thus, m is 
integrable, and it follows that m is integrable. Therefore, the process X is 
stationary with stationary density 


( 10 ) 


7r(a:,/3) 


m{x; P) 

II rn{y, P) dy ’ 


provided that the initial value of the process, Xq, has density vr, which we 
will assume in the rest of the paper. Furthermore, condition (8) guaran¬ 
tees that X has exponentially decaying p-mixing coefficients (see Lemma 
4). Condition (9) guarantees the regularity of the transition density of the 
process [see Proposition 2 in Ait-Sahalia (2002)]; note that the condition 
does not prevent A from going to — oo, it only excludes +oo as a possible 
limit. 

We will denote by the Hilbert space of measurable real-valued functions 
f on S such that = E[f{XQp] < oo for all values of p. When / is a 
function of other variables, in addition to the state variable yi, we say that 
f ^ LP if it satisfies the integrability condition for every given value of the 
other variables. 

To be able to give specihc results on the effects of the sampling random¬ 
ness on the estimation of /3, we need to put some structure on the generation 
of the sampling intervals A„ = — Tn-i- We set Yn = X^^- We assume the 

following regarding the data generating process for the sampling intervals: 


Assumption 2. The sampling intervals — t„_i are independent 

and identically distributed. Each A„ is drawn from a common distribution 
which is independent of Y^-i and of the parameter p. Also, E[Aq] < -|-oo. 

In particular, E[f{Yi)'^] = U/]]^. An important special case occurs when 
the sampling happens to take place at a fixed deterministic interval A, cor¬ 
responding to the distribution of A„ being a Dirac mass at A. See Section 
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5.3 for extensions. Throughout the paper we denote by A a generic random 
variable with the common distribution of the A„s. 

While we assume that the distribution of the sampling intervals is inde¬ 
pendent of /3, it may well depend upon its own nuisance parameters (such 
as an unknown arrival rate), but we are not interested in drawing inference 
about the (nuisance) parameters driving the sampling scheme, only about 
(3. Note also that in the case of random sampling times, the number Nt + 1 
of observations in the interval [0,r] will be random. 

2.2. The estimators and their distribution. We consider a class of esti¬ 
mators for [3 obtained by minimizing a criterion function. Specifically, to 
estimate the d-dimensional parameter vector /3, we select a vector of r mo¬ 
ment conditions h{yi.,yQ,5.,f3,e), r > d, which is continuously differentiable 
in (3. We form the sample average 

N'j' — 1 

(11) mT{l3)=N^^ ^ h(y„,y„_i,A„,/3,e) 

n=l 

and obtain (3 by minimizing the quadratic form 

( 12 ) Qt{(3) = mT{(3)'WTmT{(3), 

where Wt is an r x r positive definite weight matrix assumed to converge in 
probability to a positive definite limit Wg. If the system is exactly identihed, 
r = d, the choice of Wt is irrelevant and minimizing ( 12 ) amounts to setting 
mT{(3) to 0. The function h is known in different strands of the literature 
either as a “moment function” [see, e.g., Hansen (1982)] or an “estimating 
equation” [see, e.g., Godambe (1960) and Heyde (1997)]. 

A natural requirement on the estimating equation—what is needed for 
consistency of (3 —is that 

(13) Ea,Yi,Yo[HYi,Yo, A, Po, e)] = 0. 

Throughout the paper we denote by Ea,Yi,Yo expectations taken with respect 
to the joint law of (A,li,lo) at the true parameter Pq, and write Ea,Yi, 
and so on, for expectations taken from the appropriate marginal laws of 
(A,Yi), and so on. As will become clear in the Euler example below, some 
otherwise fairly natural estimating strategies lead to inconsistent estimators. 
To allow for this, we do not assume that (13) is necessarily satisfied. Rather, 
we simply assume that the equation EA,Yi,Yo[^iYi,YQ, A, P,£)] = 0 admits a 
unique root in /3, which we dehne as P = P{Po,£). 

With Nt/T converging in probability to (E[A])“^, it follows from stan¬ 
dard arguments that VT{P — P) converges in law to A(0, fl/j), with 

(14) n-^^ = {E[A])-^D'gS-g^Dg, 
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where 

Djs = i?A,Yi,yo[^(^i)^o, A,^,e)], 

Spj = E^,Y^,Yo[Hyi+j,Yj,A, e)hiYi,Yo, A, e)'], 

and 5^3 = 00 * 5 / 3 ,,--If r>d, the weight matrix Wt is assumed to be any 

consistent estimator of otherwise its choice is irrelevant. A consistent 
first-step estimator of (3, needed to compute the optimal weight matrix, can 
be obtained by minimizing (12) with Wt = Id. 

3. Expansion of the asymptotic variance: general results. 

3.1. The generalized infinitesimal generator. As we just saw, the asymp¬ 
totic distributions of the estimators depend upon expectations of matrices 
of the form E/\y^yfif{Yi,Yo, A, fi,£)]. Because these expectations are not 
generally available in closed-form, our approach is based on calculating Tay¬ 
lor expansions in e of these matrices. The key aspect of our approach is that 
these Taylor expansions all happen to be fully explicit. 

To calculate Taylor expansions in e of the asymptotic variances when the 
sampling intervals are random, we introduce the generalized infinitesimal 
operator T for the process X in (1). This is in analogy to the development 
in Ait-Sahalia and Mykland (2003), but permits our current more general 
form of (T^(x; 7 ). To define this operator, let us first recall a standard concept. 
The standard infinitesimal generator Ap^ is the operator which returns 

(15) Aft, . / = A+ 

when applied to functions / that are continuously differentiable once in <5, 
twice in yi and such that df /dyi and Ap^ ■ f are both in and satisfy 

df/dyi _ df/dyi ^ 
s(yi; /?) w s{yi;f3) 

We define T> to be the set of functions / which have these properties 
and are additionally continuously differentiable in /Sand e. For instance, 
functions / that are polynomial in yi near the boundaries of S, and their 
iterates by repeated application of the generator, retain their polynomial 
growth characteristic near the boundaries; so they are all in and sat¬ 
isfy (16). Near both boundaries, polynomials and their iterates diverge at 
most polynomially (under Assumption 1 , /U, and their derivatives have 
at most polynomial growth; multiplying and adding functions with poly¬ 
nomial growth yields a function still with polynomial growth). But we will 
often have exponential divergence of s{yT,(3). This would be the case, for in¬ 
stance, if the left boundary is x = — 00 , and there exist constants E > 0 and 
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K >Q such that for all x < —E and ( 0 , 7 ), for some 

a > 0; and if the right boundary is x = + 00 , there exist constants E > 0 and 
K > 0 such that for all x > -E and {0,'y), ^{x;9)/a‘^{x\'y) < —Kx°^ for some 
a > 0; if instead x = O"'', there exist constants E > 0 and K >0 such that 
for all 0 < X < i? and (^, 7 ), fi{x]9)/a‘^{x]^) > Kx~‘^ for some (j) > 1 and 
K > 0. If, however, cj) = 1 and K >1, then Assumption 1 is still satisfied, 
but s diverges only polynomially near O'*". 

Our new operator is then defined by its action on / G D: 

(-> 

Note that when Aq is random, our operator F^g is also random since it 
depends on Aq. The last term allows for the fact that [5 can be a function 
of {(3q,£). Because we will need to apply repeatedly the operator F^g, let us 
dehne 'D'^ as the set of functions / with J + 2 continuous derivatives in 5, 
2(J + 2) in yi, such that / and its first J iterates by repeated applications 
of A^g all remain in T) and additionally have J + 2 continuous derivatives 
in [3 and e. 

3.2. Behavior for small £ of the estimating equations. The limiting be¬ 
havior of the vector h of moment functions depends crucially on whether one 
is estimating separately 0, 7 or both together. If one is only estimating the 
drift parameters 9, it will typically be the case that h{yi,yQ,6, P,£) can be 
Taylor expanded around its continuous-sampling limit h{yQ,yQ,0, (3,0). On 
the other hand, when estimating 7 , we will see that such a Taylor expansion 
is not possible, and h{yi,yQ,5,f3,£) is instead of order Op(l) as e ^ 0 (and 
naturally yi —> yo and 5 —> 0 at the same time). We shall in the following 
describe a structure which is consistent with all the various estimators of (3 
we consider, and is applicable to others as well. 

We assume the following regularity condition regarding the moment func¬ 
tions h selected to conduct inference: 


Assumption 3. h{yi,yQ, 6 , (3, e) G V-^ for some J > 3. We shall in general 
consider moment functions h of the form 

(18) h{yi,yo,6,(3,£) = h{yi,yo,d,(3,£) H - - -, 

where h G 'D'^ and H G When the function H is not identically zero, 

we add the requirements that 


dH{yi,yo,0,(3o,0) 

dyi 


H{yo,yo,0,(3o,0) =0. 


(19) H{yo,yo,0,(3o,0) 
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In our definition of h, the term H captures the singularity (i.e., powers of 
1/(5) which can occur when estimating the diffusion coefficient. Consider, for 
example, the case where h is the likelihood score for 7 . The log-likelihood 
expansion in Ait-Sahalia ( 2002 ) is, for the transformed process X, 


l{yi,yo, 6, /3, e) = ln(27r5) - higher-order terms. 

By the Jacobian formula, the log-likelihood expansion for the original process 
X is, therefore, 

1 1 z' 1 

^(2/i,yo,<^,/3,e) =--ln(27r(5) - — ( / —- -dx) 

2 2(5 V^o J 

— - ln((j^(yi; 7 )) + higher-order terms. 

The score with respect to 7 is 








d(T{yin)/d-z 

o-(yi;7) 


-|- higher-order terms. 


H must contain the coefficient of (5 ^ in h, that is. 


(20) H{yi,yo,5,l3,e) = 


rvi 


1 


■ dx 


9 ( t ( x ; 7)757 


'yo 


a{x-,'yf 


dx 


lyo crix-,^) 

But we are free to add the terms of order 6 and higher to H, provided we 
subtract them from h so as to leave h unchanged. For instance, a convenient 
choice is 


H{yi,yo,6,P,s) 

( 21 ) , (1 r 

\Jyo cr(x; 7 ) J\Jyo (y{x]lY J o-(yi;7) 

and then h = h — H/5. Both choices of H satisfy (19) because H has the 
form H = a{yi)b{yi) — 6c{yi), where a and b denote, respectively, the two 
integrals and c the coefficient of 6 in (21); c = 0 in (20). At yi = yo and 
(5 = 0, we have a = b = 0, thus, H = 0; and H' = a'b + ab' — 5c' = 0; and 
H = db+ ab— Jc = 0. Note also that 


■ H = —c + y,{a'b J- ab' — 6c') J- {a'^ j2){2a'b' + a"b + ab" — 5c"), 

where prime denotes differentiation with respect to yi. At yi = yo and (5 = 0 , 
we have a = 6 = 0, thus, H = -c{yo) + '7^(yo; 7 o)a'(yo)&'( 2 /o) = 0. And 
this H does not depend on e, so dH/ds = 0, and the likelihood score is a 
martingale estimating function, hence unbiased, and so 5/3/5e = 0. Thus, 
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by adding the additional term to H in (21) relative to (20), we also have 
that = 0 at 0, which makes such an H closer to being all by itself a 

martingale in a sense we make precise in Section 3.4. It makes no difference 
for the exact likelihood since h is always a martingale, even if h and H are not 
separately, but this is convenient when analyzing likelihood approximations 
such as the Euler case in Section 4.2. 

3.3. The Djj and niatrices. In order to obtain expansions of the 
form (3) for 11^, we work on its components Dp, Sp^ and Tp = Sp — Spfl. 
The first result uses our generalized infinitesimal generator to provide the 
expansions of the matrices Dp and Spp: 

Lemma 1 (Expansions for Dp and Sp^). Let h= [hi,...,hr)' denote 
a vector of moment functions h = h + satisfying Assumption 3 and 

h G 2?^, H Also assume (27) and the other conditions on qi in Lemma 


2 . 


1. In the case where H is identically zero, we have 



( 22 ) 


and, with the notation hx h'[yi,yo,6, [3,£) = h[yi,yQ,5,f5,e)h[yi,yD,5,f5,ey, 
we have 


Sp,o = Ey.lih X h')]+eE^^Yo[i^p^ ■ [h X h'))] 
+ ^^E^,Y,[{T\-[hxh'))\ + 0[e^). 


(23) 


2. In the case where H is not zero, (22) and (23) should he evaluated at h 
rather than h, yielding Dp and q, respectively. Then Dp = Dp + Dp 


and Spp = 5'^ 0 “I" '^here 


(24) Df = Ea,Yo[^oH^Io • H)] + ^EA,Yo[Ao\r}, • H)] + O(e'), 

= EA,yo[Ao'(r;5„ • [h X H'))] + "-Ea,Yo[^oH^I ' >< ^0)] 

+ EA,yo[Ao'(r;3o • (H X h'))] + |EA,yo[Ao-i(r2^ • [H x h'))] 

+ iEA,yJAo-2(r|„. (22x22'))] 

+ |22A,yo[Ao 2 (r|„ • (22 x 22'))] + 0{e^). 


(25) 
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3.4. How far is h from a martingale? Next, we turn to an analysis of 
the more challenging time series matrix Tg = Sg — Si^^q. The simplest case 
arises when the moment function is a martingale, 

(26) ^A,yi[h(yi,yo,A,/3o,e)|To] =0. 

In this circumstance, Sgj = 0 for all j / 0, and so Tp = 0. 

Even in the serially correlated case, however, we will show that the sum of 
these time series terms can, nonetheless, be small when e is small. Intuitively, 
the closer h will be to a martingale, the smaller Tp = Sp — Sp^. 

To dehne what we mean by the distance from a moment function to a 
martingale, denote by hi the ith element of the vector of moment functions 
h, and define qi and a* by 

^A,n[hi(yi,yo,A,^,e)|yo] 

(27) =e“*gi(yo,/3o,e) 

= e“^%(yo, /3o, 0) + ^ 

where at is an integer greater than or equal to zero for each moment function 
hi- ai is an index of the order at which the moment component hi deviates 
from a martingale (note that in a vector h not all components hi need to 
have the same index ai). A martingale moment function corresponds to the 
limiting case where a* = +oo, qilY^, j3o,e) is identically zero, and Sp = Sp^. 
When the moment functions are not martingales, we will show that the 
difference Tp = Sp — Sp^ is a matrix whose element {i,j) has a leading 
term of order g that depends on qi and qj. As will become 

apparent in the following sections, (27) holds in all the estimation methods 
we consider. 

Note that EA,Yi,Yo[hi(Yi,Yo, A, P,e)] = 0 by definition of P, hence, by the 
law of iterated expectations we have that EYolqiiYo, Pq,£)] = 0. We will also 
need the function ri{y,Po,£) defined as 

poo 

(28) ri{yo,Po,e) = - Ut ■ Ap^ ■ qi{yo, Po,e)E[TN(^t)+i]dt, 

where Us ■ fiyo, S, /3, e) = Ey^ [f{Yi,Yo, A, /3, e)|yo = yo, A = (5] is the condi¬ 
tional expectation operator. Recall that Tj are the sampling times for X, 
and that Nt = if{Ti G (0,t]} (so tq = 0 is not counted). We can assert the 
following about r*: 

Lemma 2. Under Assumption 1 and (27), we suppose that qifYo, Pq,0) 
and j ^2 igf [Ap^ ■ qpfY, Po,£) be defined, bounded 

and continuous in L'^-norm on an intervale G [0,eo] (^o > 0). Thenri{y,Po,£) 
is well defined. Also, 

(29) rfiYo, PoX) = rfiYo, po,e) ^e^^qfiYo, Po,0) Op{e), 

2 L/[ZaoJ 
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where 


■OO 


h{yo,fio,£) = - / t {UfApg-qi){yo,Po,£)dt 


(30) 


{Ut ■ qi){yo,Po,£)dt. 


0 


Alternatively, ri can he defined as the solution of the differential equation 


(31) 


Ai 3 o = -qi{-,Po,£) 


with the side condition that [ri(lo 5 /^O) s)] =0. 

By convention, here and in the proofs Op{f{£)) and Op(/(e)) refer to terms 
whose norms are, respectively, 0(/(e)) and o(/(e)). 

Finally, an alternative form of r* is given in the proof of Lemma 2; see (64). 
While the index Oj and the function qi play a crucial role in determining 
the order in e of the matrix Tjj, the function r* will play an important role 
in the determination of its coefficients. 

3.5. The Tp matrix. Putting all this together, we can calculate Tp when 
h is not a martingale estimating equation. The expansion of the matrix Tp 
is obtained by applying the operator P^g as follows: 

Lemma 3 (Expansions for Tffj. Under the assumptions of Lemma 1, 
assume also (27) and the other conditions on qi in Lemma 2. Then we have 
the following: 

1. If H is zero, the {i,j) term of the time series matrix Tp = Sp — is 


given by 



■ Olj + l 


(32) 


H - —-EAo,yo[r|(, • {hi X rj)] ^E’yg[(/ij x r*)] 

+ e“*£'Ao,yo[(r/3o • {hj x r*))] 
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2. If H is nonzero, then (32) should he evaluated at h rather than h, yielding 
Tjf. And Tp = Tjf + , where 


(33) 



1 

i?[Ao] 


+ 0(e 


(e“^-iEA,Yo[AoHr/3o-^0xr,] 

+ ^^Ao,Yo[Ao'(r|„-(i^*xr,))] 
+ e“«-iEAo,Yo[Ao'(r/3o-i^,)xr,] 

+ ^^Ao,Yo[Ao'(r^o-(^iX^*))]) 

min(ai,Oj)+l'j 


Note that for most applications of Lemmas 1-3, the assumption of (27) 
and the other conditions on qi in Lemma 2 follow from the other assumptions 
of Lemma 3, as follows. Normally, one can take ai to be < 2, since the error 
term in (32) need not be smaller than that of (23), and the error term in 
(33) need not be smaller than that of (25). The conditions mentioned from 
Lemma 2 follow if hi £ and Hi £ 2?^ (more generally, hi £ 2 ?"®+^ and 

3.6. Form of the asymptotic variance matrix By combining our pre¬ 
vious results concerning Dp, Sp^ and Tp, we can now obtain an expression 
for the matrix Dp. Specifically, we have the following as an example of a 
typical situation: 

Theorem 1 (Form of the matrix Dp). Under the conditions of the pre¬ 
ceding lemmas we have the following: 

1. When we are only estimating 6, with 70 known, using a vector h such 
that H = 0, and Dp and Sp have the expansions 

De = eD^^^+e^D^0^+ 0{e^), 

Sg = £S^^^ +e‘^Df^ + Oie^), 

then the asymptotic variance of the estimator has the expansion Dg = 
n® + + O(e^), where 

=^[Ao](22(')5f - 22 lf 

2. When we are only estimating 7 , with 6 q known, and Dp and Sp have the 
expansions 


D-, = 21 W + eHW + £221(2) + o{s^), 
= 4°) + £^1) + £2,S(2) + 0(£3), 
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then the asymptotic variance of the estimator has the expansion Qj = 

where 

oW = £;[Ao]5f 

^^^3) = E[Ao](3DWsW - 

- 2 L»f + (^?W)25(2))/(L»(0))^ 


3. When we are estimating 9 and 7 jointly, and the Dp and Sp matrices 
have the expansions 


Dp = 
Sp = 



then the asymptotic variance of the estimator jd is 

Q _ f +0(e^) +0(e^) \ 

^ \ euj^^g + O(e^) + £^^77 + O(e^) / ’ 

where 

^ee ~ ^i^o]s00 /{dgj) , 

^00 ~ ^i‘^o]{d00 SqJ — ‘^d^ee s^ee)/{d^ee) 1 
= ^7 = ^[^o]se 7 V (40 4 ?)> 
4i,)=i5;[Ao]4°7V(4°7)f, 
a;^ = ^[Ao](4?47 - 24i^)4?J)/(d4 


In particular, the diagonal leading terms and u;4 corresponding to 
efficient estimation with a continuous record of observation are identical 
to their single-parameter counterparts. 


An important fact to note from the above expressions is that to first 
order in £, the asymptotic variances of 9 and 7 are unaffected by whether 
one estimates just one of them (and the other one is known) or one estimates 
both of them jointly. This is not necessarily the case for the higher-order 
terms in the asymptotic variances, since those depend upon the higher-order 
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terms in the Dg and matrices which are not necessarily identical to those 
of their single-parameter counterparts. 

Also, the leading term in Og corresponding to efficient estimation with a 
continuous record of observations is 

nf = (Eyj(a/z(yo;0o)/50)V(yo;7o)-'])"' 

provided /r is continuously differentiable with respect to 6. And the leading 
term in corresponding to efficient estimation of 7 is 

n^^^=E[Ao]{2EY,[{da{Yo;jo)/d-ffaiYo;^o)-^])-^ 
provided a is continuously differentiable with respect to 7 . In the special 
case where = 7 constant, then this becomes = 2 (TQ£i[Ao]. 

These leading terms are achieved, in particular, when h is the likelihood 
score for 9 and 7 , respectively, but also by other estimating functions that 
are able to mimic the behavior of the likelihood score at the leading order. 


3.7. Inconsistency. For the estimator to be consistent, it must be that 
P = Pq but, again, this will not be the case for every estimation method. 
However, in all the cases we consider, and one may argue for any reasonable 
estimation method, the bias will disappear in the limit where e ^ 0 , that 
is, P{Po,0) = Po (so that there is no bias in the limiting case of continuous 
sampling) and the following expansion 

Q 

(34) p = p(p^^e) = Po + Y.e%^<^^+o{sQ) 

q=l 

holds for some Q>1. The coefficients = {l/q\)d'^P{Po,0)/de'^ can be 
determined as follows. By the definition of /3, 

(35) F;A,yi,yo[^(^i>^o,A,^(/3o,e),e)] = 0. 

Consider the case where H = 0. Recognizing that ,5 is a function of e, as 
given in (34), we can compute the Taylor series expansion 

EYAh{Y,,Yo,A,p,e)\Yo,A] 

= E • h){Yo,Yo,0,Po,0) + Op(£^+'), 
i=o 

whose unconditional expectation, in light of (35), must be zero at each order 
in £. So to determine , set to zero the coefficient of e in the series expan¬ 
sion of Ea,Yi,Yo [Hy^Yo, A, PiPo, e),e)] = E^^y, [Ey, [/i(Ti, To, A, P{Po, e), e)|To, A]] 
0 = T’A,yo[(r^o • ^)(To,To,0,/3o,0)] 

= E[AP\Ey, [{Ap, • /i)(To, To, 0, /3o, 0)] 

\dh 


+ Eyo 


de 


(To,To,0,/3o,0) 


+ F;yJMTo,To,0,/3o,0)]6W 
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and, hence, if EYo[h{Yo,YQ,0, Po,0)] / 0 , 

. . 6(1) = -{E[Ao]Ey,[{A^, • h){Yo,Yo,0,Po,0)] 

( ^ +EY,[{dh/de){Yo,YoAPoM){EYMYo,Yo,0,M]r^- 

Then given 6 (i), setting the coefficient of in that series expansion to zero 
determines 6 (^), and so on. If EYQ[h{Yo,Yo,0, Po,0)] =0, then one needs to 
look at the next order term in the expansion to determine 6 (i), and so on. 

This is, for instance, what happens in the Euler scheme when estimating 9; 
see Section 4.2. 

If / 0, then (36) incorporates both h and H, and one proceeds analo¬ 
gously to determine 6 (i) and the following coefficients by setting the coeffi¬ 
cients of the expansion of (35) to 0. For an example of this, see the estimation 
of cr^ using the Euler scheme. 

4. Application to specific inference strategies. We now apply the general 
results to specific instances of moment functions h, corresponding both to 
likelihood and nonlikelihood inference strategies, for the model where cr^ = 7 
constant. 

4.1. Maximum-likelihood type estimators. The development of Ait-Sahalia and Mykland 
(2003) deals with likelihood type inference, and we recapitulate here the in¬ 
ference schemes in that work, and how they relate to the present paper. 

We applied the general results of the present paper to maximum likelihood 
estimation, using three different inference strategies: 

1. FIML: Full information maximum likelihood, using the bivariate obser¬ 
vations (y„,A„). 

2. lOML: Partial information maximum likelihood estimator using only the 
state observations 1 ^, with the sampling intervals integrated out. 

3. PFML: Pseudo maximum likelihood estimator pretending that the sam¬ 
pling intervals are fixed at A„ = A. 

All three estimators rely on maximizing a version of the likelihood function 
of the observations, that is, some functional of the transition density p of the 
A process: p{Yn\Yn-i, A^, 9) for FIML; p{Yn\Yn-i,9) = Ea^ [p(y„|y„_i, A„, 9] 
for lOML; and p{Yn\Yn-i, A,9) for PFML (which is like FIML except that 
A is used in place of the actual A„). The extent to which these estimators 
differ from one another gave rise to different “costs.” FIML is asymptotically 
efficient, making the best possible use of the joint discretely sampled data 
{Yn,An). The extent to which FIML with these data is less efficient than 
the corresponding FIML when the full sample path is observable is what we 
called the cost of discreteness. lOML is the asymptotically optimal choice 
if one recognizes that the sampling intervals A„ are random but does not 
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observe them. The extra efficiency loss relative to FIML is what we called 
the cost of randomness. PFML corresponds to the “head-in-the-sand” policy 
consisting of doing as if the sampling intervals were all identical (pretending 
that An = A) when, in fact, they are random. The extent by which PFML 
underperforms FIML is what we called the cost of ignoring the randomness. 
We then studied the relative magnitude of these costs in various situations. 

The respective scores from these likelihoods are special cases of the es¬ 
timating functions h of the present paper. But the results of the present 
paper apply to a much wider class of estimating functions than the three 
likelihood examples, such as the following. 


4.2. Estimator based on the discrete Euler scheme. We now apply our 
general results to study the properties of estimators of the drift and diffusion 
coefficients obtained by replacing the true likelihood function l{yi\yQ,6, (3) 
with its discrete Euler approximation 

(38) I {yi\yo,S,f3) = --ln{2Tra 6) --. 

This estimator is commonly used in empirical work in finance, where re¬ 
searchers often write a theoretical model set in continuous-time but then 
switch gear in their empirical work, in effect estimating the parameters 
(3 = (0, of the discrete time series model 

(39) —Xt = y,{Xt]6)A +ay/A'qt+/i, 


where the disturbance rj is A^(0,1). The properties of this estimator have 
been studied in the case where A is not random by Florens-Zmirou (1989). 
Our results apply to this particular situation as a special case. 

In the terminology of Section 3, our vector of moment functions is 


(40) 


h{yi^yo,s,p,£) 


' ieiyi\yo,^,P) ' 

.in2{yi\yo,^,P). 

Kyo,0){yi 

-\/{2a^) -L {yi 


yo-i^{yo\0)5)/cF'^ 

2/0 - /^(2/o;6')<5)^/(2cr'‘h)_ 


when both parameters in /3 = (0,0"^) are unknown, and reduces to one com¬ 
ponent when only one parameter is unknown. For this choice of h, (13) is 
not satisfied and, thus, the estimator is inconsistent. Note also that the solu¬ 
tion in 6 of A,/3,e)] = 0 is independent of cr^ and, hence, 

whether or not we are estimating does not affect the estimator of the 
drift parameter. Of course, this will not be the case in general for the true 
maximum likelihood estimator. 

As we discussed in the general case, the asymptotic bias of the estimator, 
P — /3o, will be of order 0(e) or smaller. In this particular case, if is 
known, the bias in 6 is of order 0(e). As in the general setting of Section 
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3, \/T{P — (5) converges in law to A^(0, Vlp) and an application of Lemmas 1 
and 3 yields the following. 

1. When we are only estimating 9, with (Jq known, using only the first 
equation in (40), we have ai = 2 and 


Qi{y,Po,0) 

ml] 


(41) 


4ag 


cro^Yo[{dy/dy){Yo-,eo){d^n/{dyde)){Yo-,9o)]{{dfi/de){y,eo)y 


+ 


EYMmd9){Yo-,9oW] 
o / n , 29^m^o)\dy{y]^o) 


in (27). The bias of the drift estimator is 

O-Oo 


(42) 


_ ,E[Al]EY,[{dy/dy){Yo-,eo){d^y/{dyde)){Yo-,eo)] , 

~ ^^0 TPt A 1 JY rTTbTTTamTlTTTPoTi h ) 


^^[Ao] 


4EYMmde)iYo-,eo)f 


and its asymptotic variance is = fl® + ng^^e + 0(e^) with = (JqEyq [(((9/r/50)(lo; ^o))^] 
(the limiting term corresponding to a continuous record of observations) and 


m = 


(Ji 


\mi] 


2E[Ao]EYMmdO)(Yo-,eo)y]^ 


X ( 2fTo^Yo 
+ Eyo 




Eyo 






2^2t-’(2) 

ml] 


+ 2Eyo 


- mYo 




where 


rf = -2EY,[qi{Yo,PoM=mMYo,Po,0)GiiYo,Po)] 

with Gi{yo,Po) = p° fi{zo,0o) cIzq. 

2. When we are only estimating with Oq known, using only the second 
equation in (40), we have 02 = 1 and 


(43) q2{y,m) = - Ey, 
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in (27). The bias of the diffusion estimator is 


(44) 


cr^ - do = eE[Ao]aoEYo 


5/i 

2 

+ e^?fElAl]Ey„ 


{Yo;eo) 


+ 0{e^) 


and its asymptotic variance is + 0{s^) with 0 ^ 2 ^ = 

2(Tq£^[Ao] (the same hrst-order term as MLE) and 



4a^oE[Ao]{E[Ao]EY, 


^{Yo-,9o) 

oy 




where 

=4Eyj52(do,/3o,0)G2(yo,/?o)] + -^^^EYMYo,Mr2{Yo,f5oM 

with G 2 {yo,Po) = -ctq^ r° IJ^{zo,9o)dzo. 

3. When we are estimating 6 and cr^ jointly, using both equations in (40), 
the two components of the bias vector f3 — Pq are given by (42) and (44), 
respectively, (to their respective orders only). We also have that ai = 2 ,02 = 
1 and q = ((?i,<? 2 )^ with qi and q 2 given by (41) and (43), respectively. The 
asymptotic variance of /3 is 


Ofl = 


( ^ee 


d \u;^2g u;„2„2 


,(0) 


_ + + O(e^) 

V + O(e^) YO{e^) J 

where , cv^gg^ = n^g '^, and 


+ O(e^) 


,(i) 


( 1 ) 


^u2g — ^a^2 — 


2d6 


00-2 


EYo[{dfi/de{Yo-eo)) 


21 00-2, 


JX^=Aa^,E[Ao](E[Ao]EY, 




+ C7nt 


4hl) 


with tX = ‘^Eyo[Gi{Yo, Po)q 2 {Yo, Po,0)] and ^^2 

Therefore, as is to be expected when using a first-order approximation 
to the stochastic differential equation, the asymptotic variance is, to first 
order in e, the same as for MLE inference. The impact of using the approx¬ 
imation is to second order in variances (and, of course, is responsible for 
bias in the estimator). When estimating one of the two parameters with the 
other known, the impact of the discretization approximation on the variance 
(which MLE avoids) is one order of magnitude higher than the effect of the 
discreteness of the data (which MLE is also subject to). 
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4.3. Example: the Ornstein-Uhlenbeck process. We now specialize the 
expressions above to a specific example, the stationary {9 > 0) Ornstein- 
Uhlenbeck process 


(45) dXt = -eXt dt + a dWt- 

The transition density p{yi\yo,d,f5) of this process is a Gaussian density 
with expected value e~^^yQ and variance (1 — e~‘^^^)a^l{29). The stationary 
density TT{yo,(5) is also Gaussian with mean 0 and variance a'^/{26). 

Because its transition density is known explicitly, this model constitutes 
one of the rare instances where, in addition to our Taylor expansions which 
can be calculated for any model, we can obtain exact (i.e., non-Taylor ex¬ 
panded) expressions for the matrices Dp and Tp. Specifically, for meth¬ 
ods relying on nonmartingale moment functions h, the exact calculation of 
the time series term Tp is based on 

Tp = [hiYi,Yo, A, P, e)R{Yi,Po,e)], 

where Ea,Yi[KYo,Yi, A, P,e)\Yo] = e^^qiYo, Po,e) = Q{Yo, Po,e) and 

OO 

R{YuPo, e) = ^[Ao] ^ Ey, [Q{Yk,Po, e)|ki] = V(yi, /3o, e). 

k=l 

This last expression requires the calculation of To this end, 

consider first the law of given Yi and A 2 , ■ ■ ■ ,Ak. In this case, Y^ is 
conditionally Gaussian with mean Yi exp{—0(A2 A^)} and variance 

{{k — 1) — exp{—20(A2 Afc)})(T^/(20). Hence, we obtain that 


E[Yi\Y,]=E 


Yi exp{-26'(A2 H-h A^)} 


2 1 
+ “ 1 ) “ exp{—20(A2 -I-h Afc)})|yi 

2 

= yi2^[exp{-20A}](^-i) + _ 1) _ E[exp{-2eA}]^^-^^). 

20 

In Table 1, we report results for the Ornstein-Uhlenbeck parameters esti¬ 
mated one at a time (i.e., 6 knowing and knowing 6). The quantities 
for the MLE are based on the developments in Ai’t-Sahalia and Mykland 
(2003); for the discrete Euler scheme, they follow from the results above. 


4.4. The effect of the distribution of the sampling intervals. One of the 
implications of our results concerns the impact of the distribution of the 
sampling interval on the quality of inference. It is, obviously, better to have 
as many sampling times as possible, but, to move beyond this, fix E[Ao]. 
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To the extent that our expansions depend on other features of the law of 
Aq, they do so through the moments £1 [Aq], q>2, as can be seen from the 
expressions above. 

One can then compare whether it seems preferable to minimize these 
higher-order moments, and thus have sampling at regular intervals, or whether 
a certain amount of randomness in Aq is preferable. In the case of the MLE 
for the Ornstein-Uhlenbeck process, it can be seen from Table 1 that the 
randomness of the sampling scheme makes no difference for cr^. On the other 
hand, for the Euler estimation of 9 for the same process, randomness (i.e., 
higher E[Aq]) adversely affects the bias but reduces the asymptotic vari¬ 
ance. At the first order in e, randomness has no effect on the estimation of 
cr^; at the second order, more randomness reduces the asymptotic variance 
and the bias (since the first-order bias term is negative, a higher positive 
second-order bias works to reduce the bias). 

Outside of the Ornstein-Uhlenbeck situation, it should be noted that even 
in the case of the MLE, it can occur that a somewhat random sampling can 
be preferable to sampling at a fixed interval. This occurs, for example, if 
one estimates in the presence of a known drift function /i(x) = —x(l — 
exp(—x^)) (and, hence, known 6). Eor that drift function, one then obtains 
that E[{d^fj,/dy^){YQ)] > 0 and so sgnfl^j^ = —sgnE[Ao] since when we are 


Table 1 

Asymptotic variance and bias for the Omstein-Uhlenbeck process estimated using 
maximum likelihood and the Euler scheme. These expressions follow from specializing the 
general results to the Ornstein-Uhlenbeck process. When estimating 6 with known 
using the Euler scheme, Te=Q for the Omstein-Uhlenbeck process because 
h{Yo,Yi, A, 0,e) turns out to be a martingale. Note that it is perfectly acceptable for the 
variance of 6 to be below that of the MLE estimator. This can easily oecur for an 
inconsistent estimator. Note that since 9o = log(l — S6)/5, one can create a consistent 
estimator out of the Euler estimator 9 by using log(l — S9)/5. The latter is inefficient 
relative to the MLE estimator, as expected. When estimating with known 9, the 
first-order expansion for the MLE’s 0.^2 is exact. This is because the Omstein-Uhlenbeck 
process has a constant diffusion parameter and a Gaussian likelihood. But for 9, the 
MLE’s fig involves an expansion because the exact log-likelihood of the process is a 
function of exp{—95), which in our method is then Taylor-expanded in 5 


MLE 

fig 

29.■ 

9-9o 

0 

fl^2 

e{2atE[Ao\) 

a — Gq 

0 


Euler 


2«0-=(4f?)+O(£“) 

e{2a^E[Ao]) - {49o4E[Aof) + 0{e^) 


-e{9oa^E\Ao]) -t e" ( ^ +0{e^) 
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only estimating cr^, with 6q known, the asymptotic variance of MLE is ^^^2 = 
+ O(e^) with 17^^ = 2 (TqE[Ao] and 


(46) 


n^^^ = --a^,E[Ao]E[Al]EY, 


dy^ 


{Yo-,9o) 


[see Ait-Sahalia and Mykland (2003) for an analysis of the MLE special 

case]. Since 17only depends on the first moment of Aq, there is, therefore, a 
beneficial first-order effect of random sampling on Q^ 2 . For other drifts, such 
as, for instance, y{x) = —x^, we have E[{d^y/dy^){Yo)] < 0 and, therefore, 
the opposite is true. 

There is, therefore, no overall rule that covers all cases. In general, the 
impact of the sampling depends on the coefficients associated with the mo¬ 
ments of Aq, and the expansions derived in this paper can be used to gain 
insight into this impact. 


5. Extensions of the theory. 


5.1. Extensions to more general estimating equations. In terms of admis¬ 
sible h functions, our theory can be extended from Taylor series to Laurent 
series (which have both positive and negative powers in e). That is, the 
structure can be easily generalized to a situation where h is of the form 

r/ If x/ 3 ^l Hjn{yi,yQ,S, /3,e) 

Hyi,yo,o,f5,e) = h{yi,yo,d,P,e) + - — -, 

m=l 


where h and {Hm] m = 1,..., M} satisfy Assumption 3 with d^Hm{yo, yo, 0, /3o, 0)/(9yf 
0 for A; = 1,... ,m. Since this situation does not appear in practical estima¬ 
tion methods other than for M = 1, we have stated the result for that case, 
that is, (18), to avoid needlessly complicating the notation. 

A different extension is the following. Instead of being of the form (18), 
the vector of moment functions h is of the form 

(47) h[yi,yo,d,P,e) = h{yi,yo,d,P,e) H-, 


where both h and K can be Taylor expanded as specified by (55) and 


(48) K{yo,yo,0,Po,0) = 


dK{yo,yo,0,Po,0) dK{yo,yo,0,(3o,0) 


dyi 


dp 


= 0 . 


Then a simple modification of Lemmas 1 and 3 holds: evaluate (22) and 
(23) at h instead of h, and replace (24), (25) and (33), respectively, by the 
following contributions from K: 

= EA,y„[(r;3o • K)] + |EA,Yo[(r2^ • K)] + 0{e^), 


(49) 
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S^,0 = EA,Yo[i^0o • Ch X K'))] + |^A,Uo[(r|o • (h X K'))] 

+ ^A,Fo[(r/3o • {K X h'))] + |i?A,yo[(r|o • (E X ^0)] 

(50) 

+ -EA,Yo[(r%-{KxK'))] 

+ "-EA,Y,[{rl-{KxK'))]+0{e^), 

[T^hj) = • K.) X r,] 

+ ^-^Ao,yo[(r|o ■ ^Ei X rj))] 

(51) + e“'"^^Ao,yo[(r/3o • ^i) X Vi] 

+ "-^EAo,Yo[i^},-{KjXn))]^ 

_|_ Q^gmin(Q!i,Q!j)+l^ 

yielding 0^ = 0), + and Tp = r| + . 

Note that since e is deterministic, using h or eh as the vector of moment 
functions produces the same estimator. Indeed, when h is of the form (47), 
the two matrices produced by applying Lemmas 1 and 3 with {h,H) = 
{eh + K,0) or the first part of this remark with {h,K) are identical. 

5.2. Extensions to more general Markov processes. One can extend the 
theory to cover more general continuous-time Markov processes, such as 
jump-diffusions. In that case, the standard infinitesimal generator of the 
process applied to a smooth / takes the form 

Jf3o- f = ^0o- f + j {f{yi + z,yo,5,(5,e) - f{yi,yo,5,(5,e)}v{dz,yo), 

where defined in (15), is the contribution coming from the diffusive 
part of the stochastic differential equation and v{dz,yo) is the Levy jump 
measure specifying the number of jumps of size in (z, z -t- dz) per unit of 
time [see, e.g., Protter (1992)]. In that case, our generalized infinitesimal 
generator becomes 

r f-A T f.df dfdfd 

that is, the same expression as (17) except that is replaced by Jy^. 
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5.3. Extensions to more general sampling processes. Another extension 
concerns the generation of the sampling intervals. For example, if the AjS 
are random and i.i.d., then i?[A] has the usual meaning, but even if this is 
not the case, by F^[A] we mean the limit (in probability, or just the limit if 
the AjS are nonrandom) of Y^^=i as n tends to infinity. This permits 
the inclusion of the random non-i.i.d. and the nonrandom (but possibly 
irregularly spaced) cases for the AjS. At the cost of further complications, 
the theory can be extended to allow for dependence in the sampling intervals, 
whereby A„ is drawn conditionally on (lji_i, A„_i). 

6. Proofs. 

6.1. Mixing. 

Lemma 4. Under Assumptions 1 and 2, the p-mixing coefficients of the 
discretely sampled process decay exponentially fast. 

6.2. Proof of Lemma 4. We start by showing that the sequence of p- 
mixing coefficients {p&\5 > 0 } of the process 

E[ct>{Y,){Us-i^){Yo)] 

(^2) P& — sup ll^llll /II ’ 

{(j>,p&L'^\E[(j,{Yo)]=E[ij{Yo)]=Q} 11011II yII 

decays exponentially fast as 5 increases. Under Assumption 1, specihcally 
condition ( 8 ), the operator C/ 5 , as defined just after equation (28), is a 
strong contraction and there exists k > 0 such that \\Us ■ 0|| < exp(—K(5)||'0|| 
[see Propositions 8 and 9 in Hansen and Scheinkman (1995)]. Thus, by the 
Cauchy-Schwarz inequality, 

\E[fiXo){Us • 0)(Ao)]| < II0II \\Us • 011 < II0IIII0II exp(-K,5), 
that is, PS < exp(— k(5). 

The mixing property of the underlying continuous time process {Xt] t>0} 
translates into the following mixing property for the discretely (and possibly 
randomly) sampled state process {10; n = 0,..., A0}. For functions 0 and 
0 in L^, we have 

E[ 0 (yo) 0 (i 0 )] = h[ 0 (Xo) 0 (Aa,+...+aJ] 

= HA,,...,Aj^[0(0fo)0(0fAi+...+Aj|Ai,. . . , A„]] 

= F^Ai,...,a„[£^Ao [4>{Xo)Exo [0(0^Ai+-+a„)|-^o, Ai,. .., A„]]] 
= ^Ai,...,A„[^Ao[0(^o)(t/Ai+...+A.-0)(0^o)]] 

so that 

|H[ 0 (yo) 0 (i 0 )]| <^Ai,...,Aj|^Ao[ 0 (^o)(f/Ai+...+A„ • 0 )(Xo)]|] 

< F;Ai,„„A„[exp(-A(Ai + • • • + A„))]||0||||0|| 

= {£;A[exp(-KA)]ni0||||0||, 


(53) 
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with the last equality following from the independence of the A„s. Since 
0 < £iA[exp(— kA] < 1, the YnS satisfy a mixing property sufficient to insure 
the validity of the central limit theorem for sums of functions of the data 
{{An,Yn)]n = 0,... ,Nt}. 


6.3. Proof of Lemma 1. To calculate Taylor expansions of functions 
f{Yi,Yo, A, P,e) note first that 

EYAf{YuYo,Aj,e)\Yo,A = S] 

=Y.^{^^-f){Yo,Yo,0,M + Op{e^+^). 

j=o Y 

All the expectations are taken with respect to the law of the process at the 
true value Pq. This is in analogy to Theorem 1 in Ait-Sahalia and Mykland 
[(2003), page 498]. 

1. Starting with Dp, assume hrst that H = 0, and write a Taylor expansion 
of EYi[h\Yo,A] in A, using (54): 

EA,YAHYi,Yo,A,p,e)\Yo = yo] 

= Hyo,yo,o,/3o,o) 


+ e^i?[Ao][A/ 3 (, ■ ^ ^ +0{e ), 


with the partial derivatives on the right-hand side evaluated at (yo, yo, 0, Pq, 0). 
This follows from the fact that h can be Taylor expanded in e around 0, 


(55) 


dh 1 d‘^h 

^(i/i,yo,<^,/3,e) = ^(yo,yo,o,/3o,o) + (yi -yo);^—h T.iyi-yof^r^ 

dyi 2 dyf 

, dh ^ ^ dh , dhdp{Po,0) , , , 


with all the partial derivatives of h on the right-hand side evaluated at 
(yo,yo,0,/So,0). At the next order, we can write this more compactly as 


EA,YAhiYuYo,A,p,e)\Yo = yo] 

(56) = Hyo^yo, 0,/So,0) -L e(r^o • A)(To, To,0,/So,0) 

+ L{T%-h){Yo,Yo,0,Po,0) + 0{e^). 

The unconditional expectation (22) follows from (56) by taking expectations 
with respect to Yq and using the law of iterated expectations. 

Turning to 5,g,o = E/^XiYo [^(Ti, ho, A,P,e)h{Yi,YQ, A,P, e)'], assume first 
that H = {). The result (23) follows from applying the generalized infinitesi¬ 
mal generator to h x h': 
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= {hx /i')(1o,^0!0,/3o,0) + l3o ' ^ 0,/3o, 0)] 

+ • {h X h')){Yo,YoAM]+Opie^). 

2. Suppose now that H is not zero. Let hi = hiY A~^Hi for hi G 'D'^ and 
Hi G Applying (54) to hi and Hi separately, then combining their 

expansions to get the expansion for hi, we obtain that 


(57) 


EYAhi{YuYoA,~p,e)\Yo,A] 

= .LyJhi(yi,yo,A,Ae)|yo,A] 

+ e-^Ey, [A^^HAYyYo, a, a, e)|ho, A] 



j=0 


= E-t + 


3 + 1 




because under Assumption 3 we have = 0 when evaluated at (yo, l/Oj 0, (5q, 0). 
So the expansion (54) for Hi starts at order (or higher); without that, 
a singularity of order e~^ would result from the premultiplication by £~^. 

The additional contribution to Dp is given by (24) following a similar 
construction, where we use again equation (54). From 

^yJiL(yi,yo,A,Ae)|yo,A] 

= iL(yo , yo , 0, /3o , 0) + e(r ^3 • f ) (yo, yo, o, /3o, o) 

+ y (F^^ • iL)(yo, yo, 0,/3o, 0) + Op{eA, 
where we recall that H{Yq,Yq,{), I3q,H} =0 under (19) and 
^A,Yi,yo[A-'^(Ti,yo,A,^,e)] 

= Ea,YyyA^~^Ey, [H{YyYo, AJ, e)|i"o, A]], 


we conclude that 


Ea,YuYo[^-^H{YyYo,A,P,£)] 


= Ea,Yo 


AQ^e-^\E{Tp,-H){Yo,Yo,0,f3o,0) 


+ f (F|o-F7)(yo,To,0,/3o,0) + 0(e^)| 
= Ea,Yo [Ao Hr/3o • H){Yo, Yo, 0,/3o,0)] 

+ ^EA,Yo[Ao\T%-H){Yo,Yo,0,f3o,0)] + 0{eA- 
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The term contributed by H to that is potentially the largest in¬ 
volves the cross product x that is, x 

H'){Yi,Yq, A, P,e)]. To evaluate it, we start with 

EyA{H X H'){Yi,Yo,A,p,e)\Yo,A] 

= {Hx H'){Yo,Yo,O,M + ^{^0o • (H X H')){Yo,Yo,0,po,0) 

+ • {H X H')){Yo,YoAM + Op{e^). 

Next, note that 

H{Yo,Yo,0,Po,0) = 0 and ■ {H x H')){Yo,Yo,0,f3o,0) = 0 
under (19). Indeed, we have 
{Tf3,-{HxH'))iYo,Yo,0,Po,0) 


A . ^rr rrA d{H X H') d{HxH')dp 

^ ^ de d(3 de 


= An<2H 


.dH' 


+ f,{Yo-eo)2H 


dH' 

dyi 


+ o-^(^o;7o)(^2 


dH dH' d'^H’ \ 

+ 2H ' 


dyi dyi 


dyidyi J 




de 


dp de ’ 


where in the equation above H and its derivatives, listed without argument, 
are understood to be evaluated at (Iq) 0,/3o,0). 

Since H{Yo,Yo,0,Po,0) = 0 and gn = dH{Yo,Yo,0,Po,0)/dyi = 0 un¬ 
der (19), it follows that 


(r^o-(i/xii"))(lo,lo,0,/3o,0) = 0. 


Then, from 

Ea,y,,Yo[^~\H X H'){YYYo,A,p,e)] 

= Ea,Yi,Yo[^~^Ey 2 [{H X H'){Yi,Yo, A,p, e)|To, A]], 


we conclude that 


Ea,Yi,Yo[^~\H X H')iYYYo,A,P,e)] 


= Ea,Yo 



{T}^-{HxH')){Yo,Yo,0,Po,0) 



(r^„ • (H X H')){Yo,Yo,0,Po,0) + Oie^)^ 


= Ea,Yo[^o\t}^ • {H X H')){Yo,Yo,0,Po,0)] 

+ |^A,yo[Ao'(r|o • (H X H')){Yo,YoAPo,0)] + Oie^). 
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Finally, the other two cross product terms, x h')] and 

X H')], are dealt with similarly. They are of order 0(1) since 
h(Yq,Yq,Q,I3q,Q) = 0 : 

Ey,[1^-\H X hO(Ti,To, A,;g,e)|yo, A] 

= e-^Ao-i^y, [[H X Ji'WyYo, A, e)|yo, A] 

= e-iAo-i|(if X h') + e{Tp, • {H x h')) + ^(F^^ • {H x h')) + Op{e^)^ 

= Ao^r/Jo • (H X E)) + ■ {H X h')) + Opis^) 

and similarly for £'yJA“^(h x H'){Yi,Yq, A, P,s)\Yq, A]. 

6.4. Proof of Lemma 2. Note first that ri{y,l3o,£) and ri{y,l3o,£) are 
well defined as a consequence of the boundedness of Ap^ ■ qi, and the 
exponential mixing from that follows from Lemma 4. We here take the first 
expression in (30) to be the definition of r. To see the equality with the 
second expression, note that Ap^ ■ (tqi) = qi + tAp^ ■ qi. The second expression 
for r follows. As before, Y has the stationary distribution of Xq. 

Let Nq{u) be the number of =Til£ in the interval (0,u]. Also, set 
Z{t) = i?[r 7 v(t)+i —t] and —t], and note that, by Wald’s 

identity, 

(58) (u) = E[Ao]E[N^^'> (u) + 1] - u, 

and similarly without the superscript 0. In particular, Z(t) = £Z^^\t/£). 
Since the integrals are well defined, it follows that 

poo 

£~^{ri{Y,f3o,£)-n{Y,Po,£)) = £~^ / UfApg-qi{Y,Po,£)Z{t)dt 

(59) M 

= - UfAp,- qi{Y,Po,e)Z^^\t/£)dt. 
JO 

In the sequel, we assume that e —> 0 through a countable sequence. The 
limit will be independent of the choice of sequence, and so it will be 
valid as e goes to zero generally. We also need the mixing coefficient A from 
Lemma 4 (there written as k) and an exponent Ai > 0 which can take on 
different values. 

We first need to establish some facts about Z^^'^{t), and here we make use 
of Feller (1971), to which all references in the next two paragraphs are made. 
First note that Z^^'^ is the solution of the renewal equation Z^^'^ = * 

where is the c.d.f. of Aq, and — F^^\5))d6. This 

follows from the proof of Theorem XI-3.1 (pages 366 and 367). Since we have 
assumed that L^[Aq] < oo, the same proof assures that limsup^ Z('^)(t) < oo 
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in the nonarithmetic case for Aq, and the same follows in the arithmetic 
case from the development on pages 362 and 363. (The distinction between 
the arithmetic and nonarithmetic cases is described on page 138.) Since 
is bounded, the Lemma on page 359 assures that is bounded on finite 
intervals, whence 

(60) supZ®(t)<oo and inf Z*'°^(t) > 0, 

t * 

where the latter inequality is by construction. 

Also, the same Theorem XI-3.1 in Feller (1971) establishes that —> 

iE[Ag]/E|A„] as t —> oo in the nonarithmetic case. In this case, therefore, 
for all Ai > 0, in the sense of weak convergence of measures on [0,oo), 

(61) dt ^ 

by (60). In the arithmetic case, Z^^\t) does not converge, but (61) follows 
from the results on pages 362 and 363. This is what we needed from Feller 
(1971), and we now proceed to make use of (60) and (61). 

We then establish the convergence in probability of (59). As in the proof 
of Lemma 4, 

\\UtiAp,-qi{Y,Po,e)-Ap,-qi{Y,Pom\ 

<exp{-At}||A^o ■ qi{Y,f3o,e) - Afs^ • gi(y,/3o,0)||. 

By the continuity of Ap^ ■ qi{Y, /3o, e), and by (60), we can replace UtAp^ ■ 
qi{Y,PQ,e) by UtAp^ ■ (/j(y,/3o,0) for the purpose of this convergence. Since 
UtApQ ■ qi(Y,[3o,0) can be taken to be continuous in t on [0,oo] (since the 
limit is zero as t —> oo), and in view of (61) (with Ai < A), the limit of (59) 
must be as in (29), but for the moment we have only shown convergence in 
probability. 

The final result (29) and (30) then follows if we can show that the square 
of the left-hand side of (59) is uniformly integrable as e ^ 0. This is the case 
since 

E[e~'^{ri{Y,Po,e)-ri{Y,Po,£)f] 

POO roo 

= dt dsE[Ut ■ Ap^ ■ qi{Y, po, e)f7^ • Ap^ ■ qi(Y, ^o, e)] 

Jo Jo 

X Z^^\t/e)Z^^'>{s/e). 

In the same way as in the discussion above, the limit of the integral coincides 
with the integral of the limit. Hence, uniform integrability follows. 

To see how r solves the differential equation, with the given side condition, 
proceed as follows. By the second expression in (30), and since Ap^ and Ut 
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commute, 

poo 

iA 0 Q-ri){y,Po,£)= / {Uf Ap^-qi){y,Po,£) = -qi{y,Po,£). 
Jo 

If Tj is chosen to satisfy 

(62) £;Yjri(lo,/3o,e)] =0 


under the stationary distribution, asymptotic ergodicity will force to have 
the second form from (30). 

Exploiting the form of the scale function s defined in (5), we can rewrite 
(31) as 

d [ dri{y,(5o,£) 1 

dy[ dy s{y,Po)_ 


A 

dy 


1 


s{y,Po) 


dri{y,Po,£) d‘^ri{y,Po,£) 1 


dy 


dy‘^ 


_,^^^{y\^o) 9ri(y,/3o,e) , d‘^h{y,do,e) 
‘ ^ dy^ 


(63) 


O'^(y;7o) dy 
^ 2 qi{y,Po,£) 

o-^(y;7o)'S(y;/?o)' 
To solve this, we have 
dri{y,f5o,£) 


siy,Po) 

1 

s{y,Po) 


dy 


= s{y,f5o)[Ci - 


j-y 2qi{x,Po,£) 

lx o-2(a;;7o)s(a:;/3o) 


dx 


Subject to regularity conditions on the function cr^, the constant of integra¬ 
tion must be Ci = 0, otherwise would not be integrable under vr. It follows 
that 

rv 2qi{x,Po,£) 


(64) 


ry 

My^Po,£) = C2- / 

Jx Jx 


cr2(a:;7o)s(a:;/3o) 


dx s(z, Po) dz, 


where the second constant of integration C 2 is determined so that (62) holds. 
We only need the function r for the purpose of calculating expressions of 
the form [(/)(yo)^i(do,/do,e)], where EyolPiYo)] = 0 (as when (p = q, for 
instance). Then the value of C 2 is irrelevant for the calculation of those 
unconditional expectations. 

As 0, we have ri{y,P q,0) = rj(y,/3o,0) and it follows from (63) that 

2 qi{y,Po,0) 

(^^{y,lo)s{y;Po) 

since that equation does not involve differentiation with respect to e. Indeed, 
in light of (29), we define drild£ as follows: 



^(y,/3o,0) = ^{y,Po,0) + g»(^/>/^o,0)- 
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We also define 

d^ri{Yo,Po,0) ^ d^ri{Yo,Po,0) 

dyk Qyk 

for A: = 1,2, and with these definitions of the partial derivatives of r* evalu¬ 
ated at (Ioj/^OjO) we see that ri is Taylor-expandable in the form 


ri(Yi,/3o,e) 


r,(yo,/5o,0) + (Ti-yo) 


0r,(yo,/3o,0) 

dy 


, 1^^^ ^^^2'9^^i(^o,/3o,0) , 9ri(yo,/3o,0) 

+ - 


+ Op(e). 


If = 7 constant, dividing (65) by do yields an equivalent form in terms 
of the stationary density tt: 


d r 9r^(i/,/3o,0) 
dy [ dy 


7r(y;/3o) 



9i(y,/3o,0)7r(y;/3o). 


6.5. Proof of Lemma 3. 1. When the moment condition is not a mar¬ 

tingale, the matrix Sp includes time series terms Tp = Sp — Sp^ which must 
be calculated. We start by showing the derivation in the case of scalar 
h; the generalization to the vector case is straightforward and is given at 
the end of this part of the proof. Recall equation (27), now for a scalar, 
^A,yi[^(Ti,hb, A,^,e)|yo] = e"g(ho,/3o,e), where q{Yo,Po,£) is of order 0(1) 
in e, and where the a is an integer greater than zero, typically a = 1 or 2. 
The covariance terms then become 

oo 

Tp = Sp-Sp^o = ‘2Y.Si3,k 

k=l 

oo 

= 2 ^ i?[/i(yi, To, e)/i(yfc+i, n, e)] 

3=^ 

oo 

= 2 ^ E[h{Yi , yo, A(0), 3, e)E[h{Yk+i,Yk,A^’^^, P, e)|3fc]] 

(66) k=i 

= 2 ^ E[h{Yi , yo, aW , p, e)e^q{Yk,Po, e)] 

k=l 

oo 

= 2e“^i?[h(yi,yo,A(o),^,e)i5;[g(y,,/3o,e)|yi]] 
k=l 

= IHYi , To, A, P, e)r{YuPo, e)], 

where Qj denotes the standard filtration up to time j. 

The hnal transition in (66) requires showing that 

oo 

r{y,Po,e) = £E[Ao]'^EY^,[qiYk,Po,£)\Yi = y]. 
k=l 


(67) 
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To see this, note that q{XQ,PQ, e) and ■ q){XQ, (5o,e) are integrable under 
the stationary distribution, for e G [0,eo]- Then, for t>u, 


f iAi3o-q){Xs,e)ds Xo = y 

JTn-TAU 


= E 


■(l){Xs,£)I(s>T„-iAu)ds Xo=y 


= [ E[{Ayg-q){Xs,e)\Xo = y]P{s>Tn-i/\u)ds 

Jo 

= [ {UsAp^ ■ q){y,e)P{s>Tn-i/\u)ds. 

Jo 


The validity of Fubini’s theorem and the integrability of all quantities con¬ 
sidered follow from our assumptions since also r^-i is independent of the 
X process, and the latter is stationary. These facts are also used in the 
following. 

By Ito’s lemma, and since -^q{Xs,e)a{Xs]jo) dWg is a local mar¬ 

tingale in t, we therefore get 


r+oc 

(68) E[q{Yn-i,e)\Xo = y] = - {UsAp^ ■ q){y,s)P{s >Tn-i) ds. 

Jo 


This is by first letting t -|-oo and then u -|-oo. We here use that E[q{Xt,e)] 
goes to zero as t gets large. 

To go from (68) to (67), note that the former implies 


n 


eE[Ao]Y.EYMyk,Po,e)\Y^ = y] 


(69) 


k=l 

= —e 


^- 1-00 / \ 
F;[Ao] {UsAp^-q){y,e)^Y^^P{s>Tk-i)jds. 


As n ^ -|-oo, we have X]fc=i A Tfc-i) —> +1- Note that < +oo 

by the Lemma on page 11-359 in Feller (1971). Also, since L1[A] < -|-oo, 
Fi[rArs+i] = Fl[A](£’[As] -|- 1). It follows that one can let n go to infinity in 
(69) and still have a finite limit. The result (67) follows. 

We now proceed with the analysis of Tp. Assume first that 77 = 0. We 
return to the general case below. From (66), 

Tp = ^ 0 , A,/3, e)r(yi, /3 o, e)] 

= [^(^ 0 , ho, 0, /3o, 0)r(yo, PoM 

+ eEaoXoUT i3o ■ (h X r))(yo,ho,0,/3o,0)] 
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+ ■ {h X r))(yo,lo, 0 ,/ 3 o, 0 )] + Op(£ 3 )) 

= X r)(yo,lo,0,/3o,0)] 

+ e“^Ao,ib[(r/3o • (h X r))(yo,5^o,0,/3o,0)] 

pQ + l 

+ ^^Ao,yo[(r|o • (h X r))(yo,lo,0,/3o,0)]) 


+ 0(e“+^), 

where {h x r){Yo,Yo,0,Po,0) = h{Yo,Yo,0,Po,0)r{Yo,Po,^), and 


(r/3o • X r))(yo,do,0,/3o,0) 


(70) 


= (r/3o ■h)xr + hx (r^o -r) + Aocr^(lo;7o) 


dr dh 
dyi dyi 


= A, 




,^^a+M^o;0o)^ + 


dh (7^(10; 7o) dh dhdfd 


+ hx ^Ao^/i(lo;0o)Q^+ 


2 dyi 

dr ^ o-2(yo;7o)9V 
2 dyi 


^ de^ d(i de' 


+ 


dr\ 


dh dr 


+ Aqo- (yo;7o)^ X 


dyi dyi ’ 


with the understanding here and below that the functions listed without 
arguments are all evaluated at Yi = Yq, A = 0, /9 = /3o [since /9(/3o,0) = Po] 
and e = 0. 

Note that this requires that the function r be Taylor-expandable in e as 
given in (66). 

For multidimensional h = (/ii,...,/i^)', still assuming H = 0, the {i,j) 
term of the Tp = Sp — matrix is 


iT0]{i,j) = f^{E[hi{Yi,Yo,A^^\p,e)hj{Yk+i,Yk,A^’^\p,e)] 


(71) 


k=l 






+ E[hj{Yi,Yo,A(o')J,e)hi{Yk+i,Yk,A^^\P,e)]} 
Ea,Yi,Yo [hi 0^1 , lo, A, P, £)rj (Ti, /3o, e)] 

r-EA.Vi, Yo [hj {Yi,Yo,A,P,£)ri (Yi ,Po,£)]- 


^[AoJ 


^[Ao]' 

By applying the univariate calculation above to the two terms involving hi 
and hj, it follows that [T^](j is given by 


[^/3](Li) - 


^[Ao 


e<^^-^EYPih,xrj){Yo,Yo,0,PoP 
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+ e"-’-EAo,Yo[(r/3o • {hi X rj))(yo,^o,0,/3o,0)] 

+ ^^Ao,Yo[(r|o • {hi X r,)){Yo,Yo,^,fioM) 

+ ^ rO(yo,lo,0,/3o,0)] 

+ e“'^Ao,Yo[(r/3o • {hj X ri))(yo,>o,0,/3o,0)] 

+ ^^^Ao,yo[(r|o • {hj X ?’i))(>o,^o,0,/3o,0)]) 


+ 0(e^ 


min(oi,Q!j)+2\ 


' )■ 


2. We now investigate the contribution of a nonzero H to T^. Equation 
(27) now follows from 

^A,yi[^*(^D^o,A,;3,e)|yo] 

= Ea [Eyj/ii (yi , yo , A , ;g, e) I yo, A] ] 

(72) ^ 

= Ej{^Ao[(r^^„ • h)] + ^^EAo[Ao-nr^:' • H,)]] + 0(e"+') 

= e“^gi(yo,/3o,0) + Op(e“^+i) 

if we let a* denote an index j at which the sum in the right-hand side 
of (72) is nonzero. As above, consider first the case of scalar H and recall 
that h = h + We now have to look at 

OO 

Tp = 2Y, E[h(yi, yo, aW , /3, e)h{Yk+i,Yk, e)] 

3=^ 

OO 

= 2^E[{Myi,yo,AW,Ae) 

k=l 

+ (AW)-ii7(yi,yo,A(°),Ae)}E[h(yfc+i,yfc,A('=),Ae)|9fc]] 
= 2e“”';^7?A,yi,yo [{h{Yi,Yo, A, p, e) 

+ A-^H{Y,,Yo,A,p,e)}r{Y,,Po,e)] 

+ 0(e“), 

where the term EA,Yi,yo[^(^i!^ 0 ) A,;5,e)r(yi,/3o,e)] is the one we dealt with 
in part 1 of this proof. The additional contribution to Tp is, therefore, rep- 
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resented by the term 

-^EA,n,yo Fo, A, /3, e)r{Yu(5o, e)] 

(73) 

= 2^“”' ;^i^^A,Yo [H{YyYo, a, e)r(yi, /3o, e)|ho, A]]. 

By (54), the conditional expectation of 77 x r can be Taylor-expanded as 

Ey, [77(Ti, To, a, e)r(Ti, /3o, e)|To, A] 

= 77(yo,To,0,/3o,0)r(To,/3o,0) 

(74) +e(r^„-(77xr))(yo,To,0,/3o,0) 

+ ^-{T% • (77 X r))(yo,To,0,/3o,0) + Op{e^). 

Recall that under (19), 77(10) To> 0,/3o, 0) = 0 so the term of order in (74) 
is 0. For the term of order e^, we have as in (70), 

(r^„.(77xr))(To,To,0,/3o,0) 

di" dH 

= (r/3o -77) xr + Hx (F^o • r)-7 Aocr^(To;7o)^^ 

= (r/3o ■H)xr, 


with the last equation following from the fact that 

f) M 

77(To, To, 0,/3o, 0) = —(To, To, 0,/3o, 0) = 0 
oyi 

under (19). Next, 

^A,yo[A-^e(F^„ • (77 x r))(To,To,0,/3o,0)] 

= 7^A,yo[Ao ^(F/jo • 77) X r] 


= -Ea.Yo 


Ao M Ao 




V 9A 


= Eyo 


dH a'^d'^H ' 


<9^1 


+ EMEy, 


dyf) 


dH dHd(3\ 


dH 


de 


[recall that §f-(To, To, 0, /3o, 0) = 0 under (19)]. This term may or may not be 


dp 

zero depending upon the functions 77 and r. The next order term is given 
by 

-2 

E/:^,Yo A“^ 


• (77 X r))(To,To,0,/3o,0) = -EA,yo[Ao-^(r^„ • (77 x r))]. 
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Thus, plugging the result of (74) into (73), we get 
= ;^(e“-'^A,yo[Ao'(r/3o ■H)xr] 

+ X r))]) + 0(e“+i). 

For multidimensional H = {Hi,... ,H^y, the (i,j) term of the matrix is 

[^/f](M)=e“^"';^^A,n,yo[A-'77,(yi,yo,A,Ae)r,(yi,/3o,e)] 

+ [A-^F,(yi, yo. A, e)n{Yi,(3o,e)] 

= e“^“';^£^A,Yo[Ao-'(r/3o • Hi) x r,] 

+ ? ^ 

+ '(r/3o • X r,] 

+ ^ 


6.6. Proof of Theorem 1. This corollary is a direct consequence of the 
(usual, nonstochastic) Taylor formula applied to the expression (14), with 
Dp and Sp given by Lemmas 1 and 3. 

7. Conclusions. We have developed a set of tools for analyzing a large 
class of estimators of discretely-sampled continuous-time diffusions, includ¬ 
ing their asymptotic variance and bias. By Taylor-expanding the different 
matrices involved in the asymptotic distribution of the estimators, we are 
able to deliver fully explicit expressions of the various quantities determin¬ 
ing the asymptotic properties of these estimators, and compare their rel¬ 
ative merits. Our analysis covers the case where the sampling interval is 
random. As special cases, we cover the situation where the sampling is done 
at deterministic time-varying dates and the situation where the sampling 
occurs at fixed intervals. Most estimation methods can be analyzed within 
our framework—essentially any method that can be reduced to a method 
of moments or estimating equation problem. The two specific examples we 
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analyzed display the various behaviors covered by our theorems, and we 
showed how our results can be used to assess the impact of different sam¬ 
pling patterns on the properties of these estimators. 
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